Overview

Dataset statistics

Number of variables 14
Number of observations 8886058
Missing cells 35271795
Missing cells (%) 28.4%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 949.1 MiB
Average record size in memory 112.0 B

Variable types

Numeric 6
Text 2
DateTime 1
Categorical 5

Alerts

promo_bin_1 is highly overall correlated with promo_bin_2 and 1 other fields High correlation
promo_bin_2 is highly overall correlated with promo_bin_1 and 3 other fields High correlation
promo_discount_2 is highly overall correlated with promo_bin_2 and 2 other fields High correlation
promo_discount_type_2 is highly overall correlated with promo_bin_1 and 3 other fields High correlation
promo_type_2 is highly overall correlated with promo_bin_2 and 2 other fields High correlation
revenue is highly overall correlated with sales High correlation
sales is highly overall correlated with revenue High correlation
promo_type_1 is highly imbalanced (77.3%) Imbalance
promo_type_2 is highly imbalanced (99.1%) Imbalance
sales has 302296 (3.4%) missing values Missing
revenue has 302296 (3.4%) missing values Missing
stock has 302296 (3.4%) missing values Missing
price has 91381 (1.0%) missing values Missing
promo_bin_1 has 7653515 (86.1%) missing values Missing
promo_bin_2 has 8873337 (99.9%) missing values Missing
promo_discount_2 has 8873337 (99.9%) missing values Missing
promo_discount_type_2 has 8873337 (99.9%) missing values Missing
sales is highly skewed (γ1 = 1557.844936) Skewed
revenue is highly skewed (γ1 = 815.4548181) Skewed
stock is highly skewed (γ1 = 24.21927272) Skewed
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
sales has 7048907 (79.3%) zeros Zeros
revenue has 7049979 (79.3%) zeros Zeros

Reproduction

Analysis started 2024-06-25 14:46:13.167959
Analysis finished 2024-06-25 15:04:01.734227
Duration 17 minutes and 48.57 seconds
Software version ydata-profiling v0.0.dev0
Download configuration config.json

Variables

Unnamed: 0
Real number (ℝ)

UNIFORM  UNIQUE 

Distinct 8886058
Distinct (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 4443029.5
Minimum 1
Maximum 8886058
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 67.8 MiB
2024-06-25T16:04:02.139416 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 444303.85
Q1 2221515.2
median 4443029.5
Q3 6664543.8
95-th percentile 8441755.2
Maximum 8886058
Range 8886057
Interquartile range (IQR) 4443028.5

Descriptive statistics

Standard deviation 2565184.1
Coefficient of variation (CV) 0.57735024
Kurtosis -1.2
Mean 4443029.5
Median Absolute Deviation (MAD) 2221514.5
Skewness -1.3646306 × 10 -15
Sum 3.9481018 × 10 13
Variance 6.5801696 × 10 12
Monotonicity Strictly increasing
2024-06-25T16:04:02.424525 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 1
 
< 0.1%
5924034 1
 
< 0.1%
5924048 1
 
< 0.1%
5924047 1
 
< 0.1%
5924046 1
 
< 0.1%
5924045 1
 
< 0.1%
5924044 1
 
< 0.1%
5924043 1
 
< 0.1%
5924042 1
 
< 0.1%
5924041 1
 
< 0.1%
Other values (8886048) 8886048
> 99.9%
Value Count Frequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
Value Count Frequency (%)
8886058 1
< 0.1%
8886057 1
< 0.1%
8886056 1
< 0.1%
8886055 1
< 0.1%
8886054 1
< 0.1%
8886053 1
< 0.1%
8886052 1
< 0.1%
8886051 1
< 0.1%
8886050 1
< 0.1%
8886049 1
< 0.1%

store_id
Text

Distinct 63
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 67.8 MiB
2024-06-25T16:04:02.988782 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Length

Max length 5
Median length 5
Mean length 5
Min length 5

Characters and Unicode

Total characters 44430290
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row S0002
2nd row S0002
3rd row S0002
4th row S0002
5th row S0002
Value Count Frequency (%)
s0038 334082
 
3.8%
s0085 325409
 
3.7%
s0097 279019
 
3.1%
s0094 276217
 
3.1%
s0104 271338
 
3.1%
s0062 267921
 
3.0%
s0026 266261
 
3.0%
s0056 260416
 
2.9%
s0020 253996
 
2.9%
s0108 249346
 
2.8%
Other values (53) 6102053
68.7%
2024-06-25T16:04:04.664921 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 17984981
40.5%
S 8886058
20.0%
1 3355402
 
7.6%
2 3330114
 
7.5%
5 2016937
 
4.5%
8 1722869
 
3.9%
6 1692194
 
3.8%
3 1655994
 
3.7%
4 1418789
 
3.2%
9 1281762
 
2.9%

Most occurring categories

Value Count Frequency (%)
(unknown) 44430290
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 17984981
40.5%
S 8886058
20.0%
1 3355402
 
7.6%
2 3330114
 
7.5%
5 2016937
 
4.5%
8 1722869
 
3.9%
6 1692194
 
3.8%
3 1655994
 
3.7%
4 1418789
 
3.2%
9 1281762
 
2.9%

Most occurring scripts

Value Count Frequency (%)
(unknown) 44430290
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 17984981
40.5%
S 8886058
20.0%
1 3355402
 
7.6%
2 3330114
 
7.5%
5 2016937
 
4.5%
8 1722869
 
3.9%
6 1692194
 
3.8%
3 1655994
 
3.7%
4 1418789
 
3.2%
9 1281762
 
2.9%

Most occurring blocks

Value Count Frequency (%)
(unknown) 44430290
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 17984981
40.5%
S 8886058
20.0%
1 3355402
 
7.6%
2 3330114
 
7.5%
5 2016937
 
4.5%
8 1722869
 
3.9%
6 1692194
 
3.8%
3 1655994
 
3.7%
4 1418789
 
3.2%
9 1281762
 
2.9%

product_id
Text

Distinct 615
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 67.8 MiB
2024-06-25T16:04:05.601953 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Length

Max length 5
Median length 5
Mean length 5
Min length 5

Characters and Unicode

Total characters 44430290
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1 ?
Unique (%) < 0.1%

Sample

1st row P0001
2nd row P0005
3rd row P0011
4th row P0015
5th row P0017
Value Count Frequency (%)
p0664 59051
 
0.7%
p0125 58708
 
0.7%
p0261 58504
 
0.7%
p0364 58428
 
0.7%
p0131 58117
 
0.7%
p0694 57956
 
0.7%
p0116 57940
 
0.7%
p0390 57872
 
0.7%
p0372 57699
 
0.6%
p0333 57473
 
0.6%
Other values (605) 8304310
93.5%
2024-06-25T16:04:06.894812 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 11721008
26.4%
P 8886058
20.0%
1 3416438
 
7.7%
6 3077549
 
6.9%
4 2994432
 
6.7%
5 2974769
 
6.7%
2 2926654
 
6.6%
3 2851268
 
6.4%
7 2273576
 
5.1%
9 1760049
 
4.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 44430290
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 11721008
26.4%
P 8886058
20.0%
1 3416438
 
7.7%
6 3077549
 
6.9%
4 2994432
 
6.7%
5 2974769
 
6.7%
2 2926654
 
6.6%
3 2851268
 
6.4%
7 2273576
 
5.1%
9 1760049
 
4.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 44430290
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 11721008
26.4%
P 8886058
20.0%
1 3416438
 
7.7%
6 3077549
 
6.9%
4 2994432
 
6.7%
5 2974769
 
6.7%
2 2926654
 
6.6%
3 2851268
 
6.4%
7 2273576
 
5.1%
9 1760049
 
4.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 44430290
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 11721008
26.4%
P 8886058
20.0%
1 3416438
 
7.7%
6 3077549
 
6.9%
4 2994432
 
6.7%
5 2974769
 
6.7%
2 2926654
 
6.6%
3 2851268
 
6.4%
7 2273576
 
5.1%
9 1760049
 
4.0%

date
Date

Distinct 1033
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 67.8 MiB
Minimum 2017-01-02 00:00:00
Maximum 2019-10-31 00:00:00
2024-06-25T16:04:07.180639 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:04:07.514060 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

sales
Real number (ℝ)

HIGH CORRELATION   MISSING  SKEWED  ZEROS 

Distinct 5435
Distinct (%) 0.1%
Missing 302296
Missing (%) 3.4%
Infinite 0
Infinite (%) 0.0%
Mean 0.47340804
Minimum 0
Maximum 43301
Zeros 7048907
Zeros (%) 79.3%
Negative 0
Negative (%) 0.0%
Memory size 67.8 MiB
2024-06-25T16:04:07.762565 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 2
Maximum 43301
Range 43301
Interquartile range (IQR) 0

Descriptive statistics

Standard deviation 21.290586
Coefficient of variation (CV) 44.973012
Kurtosis 2698722.2
Mean 0.47340804
Median Absolute Deviation (MAD) 0
Skewness 1557.8449
Sum 4063622
Variance 453.28904
Monotonicity Not monotonic
2024-06-25T16:04:08.009240 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 7048907
79.3%
1 848271
 
9.5%
2 298996
 
3.4%
3 130620
 
1.5%
4 73361
 
0.8%
5 43412
 
0.5%
6 30100
 
0.3%
7 19260
 
0.2%
8 14027
 
0.2%
9 10168
 
0.1%
Other values (5425) 66640
 
0.7%
(Missing) 302296
 
3.4%
Value Count Frequency (%)
0 7048907
79.3%
0.018 1
 
< 0.1%
0.022 1
 
< 0.1%
0.024 1
 
< 0.1%
0.03 1
 
< 0.1%
0.032 1
 
< 0.1%
0.034 1
 
< 0.1%
0.038 1
 
< 0.1%
0.042 1
 
< 0.1%
0.044 2
 
< 0.1%
Value Count Frequency (%)
43301 1
 
< 0.1%
27656 1
 
< 0.1%
27652 1
 
< 0.1%
13828 1
 
< 0.1%
13826 1
 
< 0.1%
6408 1
 
< 0.1%
1801 1
 
< 0.1%
1720 1
 
< 0.1%
1000 3
< 0.1%
816 1
 
< 0.1%

revenue
Real number (ℝ)

HIGH CORRELATION   MISSING  SKEWED  ZEROS 

Distinct 12155
Distinct (%) 0.1%
Missing 302296
Missing (%) 3.4%
Infinite 0
Infinite (%) 0.0%
Mean 2.285173
Minimum 0
Maximum 84197.961
Zeros 7049979
Zeros (%) 79.3%
Negative 0
Negative (%) 0.0%
Memory size 67.8 MiB
2024-06-25T16:04:08.302030 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 11.76
Maximum 84197.961
Range 84197.961
Interquartile range (IQR) 0

Descriptive statistics

Standard deviation 54.06806
Coefficient of variation (CV) 23.66038
Kurtosis 966651.01
Mean 2.285173
Median Absolute Deviation (MAD) 0
Skewness 815.45482
Sum 19615381
Variance 2923.3551
Monotonicity Not monotonic
2024-06-25T16:04:08.588194 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 7049979
79.3%
0.93 33675
 
0.4%
3.24 27591
 
0.3%
1.85 25154
 
0.3%
2.31 23568
 
0.3%
2.78 23341
 
0.3%
2.73 18024
 
0.2%
1.39 17439
 
0.2%
1.16 16214
 
0.2%
3.66 15914
 
0.2%
Other values (12145) 1332863
 
15.0%
(Missing) 302296
 
3.4%
Value Count Frequency (%)
0 7049979
79.3%
0.01 158
 
< 0.1%
0.02 16
 
< 0.1%
0.03 10
 
< 0.1%
0.05 2
 
< 0.1%
0.06 1
 
< 0.1%
0.1 1
 
< 0.1%
0.23 537
 
< 0.1%
0.25 1
 
< 0.1%
0.27 2
 
< 0.1%
Value Count Frequency (%)
84197.961 1
< 0.1%
52496.852 1
< 0.1%
52488.699 1
< 0.1%
32490.51 1
< 0.1%
31150 1
< 0.1%
30327.01 1
< 0.1%
26711.859 1
< 0.1%
26247.59 1
< 0.1%
26243.801 1
< 0.1%
25423.73 1
< 0.1%

stock
Real number (ℝ)

MISSING  SKEWED 

Distinct 9039
Distinct (%) 0.1%
Missing 302296
Missing (%) 3.4%
Infinite 0
Infinite (%) 0.0%
Mean 16.005747
Minimum 0
Maximum 4655
Zeros 66086
Zeros (%) 0.7%
Negative 0
Negative (%) 0.0%
Memory size 67.8 MiB
2024-06-25T16:04:09.344779 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 1
Q1 4
median 8
Q3 17
95-th percentile 48
Maximum 4655
Range 4655
Interquartile range (IQR) 13

Descriptive statistics

Standard deviation 37.516921
Coefficient of variation (CV) 2.3439656
Kurtosis 1418.7495
Mean 16.005747
Median Absolute Deviation (MAD) 5
Skewness 24.219273
Sum 1.3738952 × 10 8
Variance 1407.5194
Monotonicity Not monotonic
2024-06-25T16:04:09.745580 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
4 619203
 
7.0%
3 617176
 
6.9%
6 600701
 
6.8%
2 585179
 
6.6%
5 571932
 
6.4%
1 464188
 
5.2%
7 435351
 
4.9%
8 387802
 
4.4%
9 354054
 
4.0%
12 353011
 
4.0%
Other values (9029) 3595165
40.5%
Value Count Frequency (%)
0 66086
0.7%
0.001 38
 
< 0.1%
0.002 53
 
< 0.1%
0.003 65
 
< 0.1%
0.004 323
 
< 0.1%
0.005 333
 
< 0.1%
0.006 30
 
< 0.1%
0.007 15
 
< 0.1%
0.008 25
 
< 0.1%
0.009 11
 
< 0.1%
Value Count Frequency (%)
4655 1
< 0.1%
4582 1
< 0.1%
4473 1
< 0.1%
4404 1
< 0.1%
4384 1
< 0.1%
4320 1
< 0.1%
4308 1
< 0.1%
4292 1
< 0.1%
4273 1
< 0.1%
4243 1
< 0.1%

price
Real number (ℝ)

MISSING 

Distinct 606
Distinct (%) < 0.1%
Missing 91381
Missing (%) 1.0%
Infinite 0
Infinite (%) 0.0%
Mean 15.753767
Minimum 0.01
Maximum 1599
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 67.8 MiB
2024-06-25T16:04:09.961222 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum 0.01
5-th percentile 1
Q1 3.45
median 8
Q3 16.95
95-th percentile 53.9
Maximum 1599
Range 1598.99
Interquartile range (IQR) 13.5

Descriptive statistics

Standard deviation 32.77869
Coefficient of variation (CV) 2.0806891
Kurtosis 521.81016
Mean 15.753767
Median Absolute Deviation (MAD) 5.5
Skewness 16.550523
Sum 1.3854929 × 10 8
Variance 1074.4425
Monotonicity Not monotonic
2024-06-25T16:04:10.279844 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 243419
 
2.7%
3.95 156115
 
1.8%
3.5 147047
 
1.7%
0.75 146503
 
1.6%
2.95 137147
 
1.5%
19.9 133650
 
1.5%
11.9 131649
 
1.5%
1.75 116461
 
1.3%
12.9 115990
 
1.3%
2.5 109490
 
1.2%
Other values (596) 7357206
82.8%
Value Count Frequency (%)
0.01 136
 
< 0.1%
0.25 7983
 
0.1%
0.3 107
 
< 0.1%
0.35 237
 
< 0.1%
0.4 1175
 
< 0.1%
0.45 12344
 
0.1%
0.5 45814
0.5%
0.58 512
 
< 0.1%
0.6 24658
0.3%
0.65 50459
0.6%
Value Count Frequency (%)
1599 174
 
< 0.1%
1549 115
 
< 0.1%
1499 160
 
< 0.1%
1449 95
 
< 0.1%
1399 127
 
< 0.1%
1349 174
 
< 0.1%
849.9 574
< 0.1%
749.9 8
 
< 0.1%
749 63
 
< 0.1%
699.9 19
 
< 0.1%

promo_type_1
Categorical

IMBALANCE 

Distinct 17
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 67.8 MiB
PR14
7653515 
PR05
 
547253
PR10
 
213664
PR03
 
151863
PR06
 
124289
Other values (12)
 
195474

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 35544232
Distinct characters 12
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row PR14
2nd row PR14
3rd row PR14
4th row PR14
5th row PR14

Common Values

Value Count Frequency (%)
PR14 7653515
86.1%
PR05 547253
 
6.2%
PR10 213664
 
2.4%
PR03 151863
 
1.7%
PR06 124289
 
1.4%
PR07 57419
 
0.6%
PR12 40840
 
0.5%
PR09 35752
 
0.4%
PR17 32863
 
0.4%
PR01 12618
 
0.1%
Other values (7) 15982
 
0.2%

Length

2024-06-25T16:04:10.577502 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
pr14 7653515
86.1%
pr05 547253
 
6.2%
pr10 213664
 
2.4%
pr03 151863
 
1.7%
pr06 124289
 
1.4%
pr07 57419
 
0.6%
pr12 40840
 
0.5%
pr09 35752
 
0.4%
pr17 32863
 
0.4%
pr01 12618
 
0.1%
Other values (7) 15982
 
0.2%

Most occurring characters

Value Count Frequency (%)
P 8886058
25.0%
R 8886058
25.0%
1 7966930
22.4%
4 7656898
21.5%
0 1150417
 
3.2%
5 547272
 
1.5%
3 152470
 
0.4%
6 125201
 
0.4%
7 90282
 
0.3%
2 40840
 
0.1%
Other values (2) 41806
 
0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 35544232
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
P 8886058
25.0%
R 8886058
25.0%
1 7966930
22.4%
4 7656898
21.5%
0 1150417
 
3.2%
5 547272
 
1.5%
3 152470
 
0.4%
6 125201
 
0.4%
7 90282
 
0.3%
2 40840
 
0.1%
Other values (2) 41806
 
0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 35544232
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
P 8886058
25.0%
R 8886058
25.0%
1 7966930
22.4%
4 7656898
21.5%
0 1150417
 
3.2%
5 547272
 
1.5%
3 152470
 
0.4%
6 125201
 
0.4%
7 90282
 
0.3%
2 40840
 
0.1%
Other values (2) 41806
 
0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 35544232
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
P 8886058
25.0%
R 8886058
25.0%
1 7966930
22.4%
4 7656898
21.5%
0 1150417
 
3.2%
5 547272
 
1.5%
3 152470
 
0.4%
6 125201
 
0.4%
7 90282
 
0.3%
2 40840
 
0.1%
Other values (2) 41806
 
0.1%

promo_bin_1
Categorical

HIGH CORRELATION   MISSING 

Distinct 5
Distinct (%) < 0.1%
Missing 7653515
Missing (%) 86.1%
Memory size 67.8 MiB
verylow
514398 
low
259135 
moderate
193475 
high
146120 
veryhigh
119415 

Length

Max length 8
Median length 7
Mean length 6.0572256
Min length 3

Characters and Unicode

Total characters 7465791
Distinct characters 14
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row verylow
2nd row moderate
3rd row low
4th row high
5th row low

Common Values

Value Count Frequency (%)
verylow 514398
 
5.8%
low 259135
 
2.9%
moderate 193475
 
2.2%
high 146120
 
1.6%
veryhigh 119415
 
1.3%
(Missing) 7653515
86.1%

Length

2024-06-25T16:04:10.772692 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-25T16:04:10.999232 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Value Count Frequency (%)
verylow 514398
41.7%
low 259135
21.0%
moderate 193475
 
15.7%
high 146120
 
11.9%
veryhigh 119415
 
9.7%

Most occurring characters

Value Count Frequency (%)
e 1020763
13.7%
o 967008
13.0%
r 827288
11.1%
l 773533
10.4%
w 773533
10.4%
v 633813
8.5%
y 633813
8.5%
h 531070
7.1%
i 265535
 
3.6%
g 265535
 
3.6%
Other values (4) 773900
10.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 7465791
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 1020763
13.7%
o 967008
13.0%
r 827288
11.1%
l 773533
10.4%
w 773533
10.4%
v 633813
8.5%
y 633813
8.5%
h 531070
7.1%
i 265535
 
3.6%
g 265535
 
3.6%
Other values (4) 773900
10.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 7465791
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 1020763
13.7%
o 967008
13.0%
r 827288
11.1%
l 773533
10.4%
w 773533
10.4%
v 633813
8.5%
y 633813
8.5%
h 531070
7.1%
i 265535
 
3.6%
g 265535
 
3.6%
Other values (4) 773900
10.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 7465791
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 1020763
13.7%
o 967008
13.0%
r 827288
11.1%
l 773533
10.4%
w 773533
10.4%
v 633813
8.5%
y 633813
8.5%
h 531070
7.1%
i 265535
 
3.6%
g 265535
 
3.6%
Other values (4) 773900
10.4%

promo_type_2
Categorical

HIGH CORRELATION   IMBALANCE 

Distinct 4
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 67.8 MiB
PR03
8873337 
PR02
 
7026
PR04
 
2892
PR01
 
2803

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 35544232
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row PR03
2nd row PR03
3rd row PR03
4th row PR03
5th row PR03

Common Values

Value Count Frequency (%)
PR03 8873337
99.9%
PR02 7026
 
0.1%
PR04 2892
 
< 0.1%
PR01 2803
 
< 0.1%

Length

2024-06-25T16:04:11.224881 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-25T16:04:11.397605 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Value Count Frequency (%)
pr03 8873337
99.9%
pr02 7026
 
0.1%
pr04 2892
 
< 0.1%
pr01 2803
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
P 8886058
25.0%
R 8886058
25.0%
0 8886058
25.0%
3 8873337
25.0%
2 7026
 
< 0.1%
4 2892
 
< 0.1%
1 2803
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 35544232
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
P 8886058
25.0%
R 8886058
25.0%
0 8886058
25.0%
3 8873337
25.0%
2 7026
 
< 0.1%
4 2892
 
< 0.1%
1 2803
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 35544232
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
P 8886058
25.0%
R 8886058
25.0%
0 8886058
25.0%
3 8873337
25.0%
2 7026
 
< 0.1%
4 2892
 
< 0.1%
1 2803
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 35544232
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
P 8886058
25.0%
R 8886058
25.0%
0 8886058
25.0%
3 8873337
25.0%
2 7026
 
< 0.1%
4 2892
 
< 0.1%
1 2803
 
< 0.1%

promo_bin_2
Categorical

HIGH CORRELATION   MISSING 

Distinct 3
Distinct (%) < 0.1%
Missing 8873337
Missing (%) 99.9%
Memory size 67.8 MiB
verylow
6441 
high
3637 
veryhigh
2643 

Length

Max length 8
Median length 7
Mean length 6.3500511
Min length 4

Characters and Unicode

Total characters 80779
Distinct characters 10
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row verylow
2nd row verylow
3rd row verylow
4th row verylow
5th row verylow

Common Values

Value Count Frequency (%)
verylow 6441
 
0.1%
high 3637
 
< 0.1%
veryhigh 2643
 
< 0.1%
(Missing) 8873337
99.9%

Length

2024-06-25T16:04:11.620354 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-25T16:04:11.819361 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Value Count Frequency (%)
verylow 6441
50.6%
high 3637
28.6%
veryhigh 2643
20.8%

Most occurring characters

Value Count Frequency (%)
h 12560
15.5%
v 9084
11.2%
e 9084
11.2%
r 9084
11.2%
y 9084
11.2%
l 6441
8.0%
o 6441
8.0%
w 6441
8.0%
i 6280
7.8%
g 6280
7.8%

Most occurring categories

Value Count Frequency (%)
(unknown) 80779
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
h 12560
15.5%
v 9084
11.2%
e 9084
11.2%
r 9084
11.2%
y 9084
11.2%
l 6441
8.0%
o 6441
8.0%
w 6441
8.0%
i 6280
7.8%
g 6280
7.8%

Most occurring scripts

Value Count Frequency (%)
(unknown) 80779
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
h 12560
15.5%
v 9084
11.2%
e 9084
11.2%
r 9084
11.2%
y 9084
11.2%
l 6441
8.0%
o 6441
8.0%
w 6441
8.0%
i 6280
7.8%
g 6280
7.8%

Most occurring blocks

Value Count Frequency (%)
(unknown) 80779
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
h 12560
15.5%
v 9084
11.2%
e 9084
11.2%
r 9084
11.2%
y 9084
11.2%
l 6441
8.0%
o 6441
8.0%
w 6441
8.0%
i 6280
7.8%
g 6280
7.8%

promo_discount_2
Real number (ℝ)

HIGH CORRELATION   MISSING 

Distinct 6
Distinct (%) < 0.1%
Missing 8873337
Missing (%) 99.9%
Infinite 0
Infinite (%) 0.0%
Mean 30.110605
Minimum 16
Maximum 50
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 67.8 MiB
2024-06-25T16:04:11.973126 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum 16
5-th percentile 20
Q1 20
median 20
Q3 35
95-th percentile 50
Maximum 50
Range 34
Interquartile range (IQR) 15

Descriptive statistics

Standard deviation 11.8509
Coefficient of variation (CV) 0.39357893
Kurtosis -1.0464257
Mean 30.110605
Median Absolute Deviation (MAD) 4
Skewness 0.66762654
Sum 383037
Variance 140.44382
Monotonicity Not monotonic
2024-06-25T16:04:12.136021 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
Value Count Frequency (%)
20 6226
 
0.1%
33 2804
 
< 0.1%
50 2643
 
< 0.1%
35 585
 
< 0.1%
40 248
 
< 0.1%
16 215
 
< 0.1%
(Missing) 8873337
99.9%
Value Count Frequency (%)
16 215
 
< 0.1%
20 6226
0.1%
33 2804
< 0.1%
35 585
 
< 0.1%
40 248
 
< 0.1%
50 2643
< 0.1%
Value Count Frequency (%)
50 2643
< 0.1%
40 248
 
< 0.1%
35 585
 
< 0.1%
33 2804
< 0.1%
20 6226
0.1%
16 215
 
< 0.1%

promo_discount_type_2
Categorical

HIGH CORRELATION   MISSING 

Distinct 4
Distinct (%) < 0.1%
Missing 8873337
Missing (%) 99.9%
Memory size 67.8 MiB
PR01
3762 
PR02
3648 
PR04
2793 
PR03
2518 

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 50884
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row PR04
2nd row PR02
3rd row PR04
4th row PR02
5th row PR02

Common Values

Value Count Frequency (%)
PR01 3762
 
< 0.1%
PR02 3648
 
< 0.1%
PR04 2793
 
< 0.1%
PR03 2518
 
< 0.1%
(Missing) 8873337
99.9%

Length

2024-06-25T16:04:12.336701 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-25T16:04:12.498736 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Value Count Frequency (%)
pr01 3762
29.6%
pr02 3648
28.7%
pr04 2793
22.0%
pr03 2518
19.8%

Most occurring characters

Value Count Frequency (%)
P 12721
25.0%
R 12721
25.0%
0 12721
25.0%
1 3762
 
7.4%
2 3648
 
7.2%
4 2793
 
5.5%
3 2518
 
4.9%

Most occurring categories

Value Count Frequency (%)
(unknown) 50884
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
P 12721
25.0%
R 12721
25.0%
0 12721
25.0%
1 3762
 
7.4%
2 3648
 
7.2%
4 2793
 
5.5%
3 2518
 
4.9%

Most occurring scripts

Value Count Frequency (%)
(unknown) 50884
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
P 12721
25.0%
R 12721
25.0%
0 12721
25.0%
1 3762
 
7.4%
2 3648
 
7.2%
4 2793
 
5.5%
3 2518
 
4.9%

Most occurring blocks

Value Count Frequency (%)
(unknown) 50884
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
P 12721
25.0%
R 12721
25.0%
0 12721
25.0%
1 3762
 
7.4%
2 3648
 
7.2%
4 2793
 
5.5%
3 2518
 
4.9%

Interactions

2024-06-25T16:02:49.926823 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:01:52.573676 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:04.812716 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:16.024170 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:28.759611 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:40.597273 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:50.228062 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:01:55.081582 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:06.846467 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:17.965371 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:30.956265 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:42.616833 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:50.507911 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:01:57.308545 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:08.843312 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:19.865922 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:32.858263 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:44.923482 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:50.793746 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:01:59.712583 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:11.091941 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:21.824759 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:35.638468 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:47.279097 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:51.056736 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:02.216828 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:13.265971 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:24.545688 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:38.229721 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:49.258763 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:51.273685 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:02.525122 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:13.626665 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:25.120379 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:38.515140 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
2024-06-25T16:02:49.547026 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-06-25T16:04:12.652520 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Unnamed: 0 price promo_bin_1 promo_bin_2 promo_discount_2 promo_discount_type_2 promo_type_1 promo_type_2 revenue sales stock
Unnamed: 0 1.000 0.025 0.021 0.412 0.150 0.309 0.013 0.031 -0.039 -0.039 -0.011
price 0.025 1.000 0.043 0.078 -0.138 0.100 0.064 0.002 -0.208 -0.252 -0.360
promo_bin_1 0.021 0.043 1.000 0.872 -0.311 0.832 0.430 0.029 0.059 0.062 0.088
promo_bin_2 0.412 0.078 0.872 1.000 -0.778 0.904 0.303 0.885 -0.010 -0.014 -0.088
promo_discount_2 0.150 -0.138 -0.311 -0.778 1.000 0.777 0.348 0.985 0.053 0.054 0.031
promo_discount_type_2 0.309 0.100 0.832 0.904 0.777 1.000 0.265 0.896 0.220 0.237 0.215
promo_type_1 0.013 0.064 0.430 0.303 0.348 0.265 1.000 0.013 -0.037 -0.031 0.002
promo_type_2 0.031 0.002 0.029 0.885 0.985 0.896 0.013 1.000 -0.004 -0.003 -0.000
revenue -0.039 -0.208 0.059 -0.010 0.053 0.220 -0.037 -0.004 1.000 0.992 0.184
sales -0.039 -0.252 0.062 -0.014 0.054 0.237 -0.031 -0.003 0.992 1.000 0.202
stock -0.011 -0.360 0.088 -0.088 0.031 0.215 0.002 -0.000 0.184 0.202 1.000

Missing values

2024-06-25T16:02:53.995866 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-06-25T16:03:05.487781 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-06-25T16:03:46.623420 image/svg+xml Matplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0 store_id product_id date sales revenue stock price promo_type_1 promo_bin_1 promo_type_2 promo_bin_2 promo_discount_2 promo_discount_type_2
0 1 S0002 P0001 2017-01-02 0.0 0.00 8.0 6.25 PR14 NaN PR03 NaN NaN NaN
1 2 S0002 P0005 2017-01-02 0.0 0.00 11.0 33.90 PR14 NaN PR03 NaN NaN NaN
2 3 S0002 P0011 2017-01-02 0.0 0.00 9.0 49.90 PR14 NaN PR03 NaN NaN NaN
3 4 S0002 P0015 2017-01-02 1.0 2.41 19.0 2.60 PR14 NaN PR03 NaN NaN NaN
4 5 S0002 P0017 2017-01-02 0.0 0.00 12.0 1.49 PR14 NaN PR03 NaN NaN NaN
5 6 S0002 P0018 2017-01-02 1.0 1.81 37.0 1.95 PR14 NaN PR03 NaN NaN NaN
6 7 S0002 P0024 2017-01-02 0.0 0.00 36.0 1.95 PR14 NaN PR03 NaN NaN NaN
7 8 S0002 P0035 2017-01-02 2.0 4.54 15.0 2.45 PR14 NaN PR03 NaN NaN NaN
8 9 S0002 P0046 2017-01-02 0.0 0.00 11.0 34.50 PR14 NaN PR03 NaN NaN NaN
9 10 S0002 P0051 2017-01-02 7.0 4.54 132.0 0.70 PR14 NaN PR03 NaN NaN NaN
Unnamed: 0 store_id product_id date sales revenue stock price promo_type_1 promo_bin_1 promo_type_2 promo_bin_2 promo_discount_2 promo_discount_type_2
8886048 8886049 S0143 P0639 2019-10-31 NaN NaN NaN 9.75 PR14 NaN PR03 NaN NaN NaN
8886049 8886050 S0143 P0642 2019-10-31 NaN NaN NaN 4.00 PR14 NaN PR03 NaN NaN NaN
8886050 8886051 S0143 P0658 2019-10-31 NaN NaN NaN 41.50 PR14 NaN PR03 NaN NaN NaN
8886051 8886052 S0143 P0663 2019-10-31 NaN NaN NaN 6.75 PR10 verylow PR03 NaN NaN NaN
8886052 8886053 S0143 P0664 2019-10-31 NaN NaN NaN 1.75 PR14 NaN PR03 NaN NaN NaN
8886053 8886054 S0143 P0676 2019-10-31 NaN NaN NaN 19.90 PR03 verylow PR03 NaN NaN NaN
8886054 8886055 S0143 P0680 2019-10-31 NaN NaN NaN 139.90 PR14 NaN PR03 NaN NaN NaN
8886055 8886056 S0143 P0694 2019-10-31 NaN NaN NaN 7.50 PR14 NaN PR03 NaN NaN NaN
8886056 8886057 S0143 P0718 2019-10-31 NaN NaN NaN 23.75 PR14 NaN PR03 NaN NaN NaN
8886057 8886058 S0143 P0747 2019-10-31 NaN NaN NaN 21.90 PR14 NaN PR03 NaN NaN NaN